The Hybrid Approach for Handling and Detecting Outliers from Dynamic Data Stream

نویسنده

  • Pragati Patil
چکیده

The Outlier detection is currently area of active research in data set mining community. In this article we propose hybrid approach to capture outliers in dynamic data stream. We apply k-mean algorithm which Partition the data set into number of chunks or clusters. Each chunk contains set of data. Once cluster are formed, centroid of each cluster are calculated. The points which are lying near the centroid of the cluster are not probable candidate outlier and we can prune out such points from each cluster. Next distance based technique is used to find the distance from centroid to candidate outlier. For that threshold value is set. If this distance is greater than threshold value then it will declare as outlier otherwise as a real object. In proposed approach, two techniques are combining to efficiently find the outlier from the data set. This hybrid approach takes less computational cost. Proposed algorithm efficiently prune of the safe cells and save huge number of extra calculations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic DEA: A Hybrid Measure Approach

 In the real world applications, there are some situations where inputs and outputs are time-dependent and are affected during the production periods. Capital stock can be seen as an effective instance in such occasions. In order to handling long-time planning, dynamic structure was proposed in efficiency evaluation. In this framework, there are some of the inputs and outputs change proportiona...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

Detecting Outliers in Exponentiated Pareto Distribution

In this paper, we use two statistics for detecting outliers in exponentiated Paretodistribution. These statistics are the extension of the statistics for detecting outliers inexponential and gamma distributions. In fact, we compare the power of our test statisticsbased on the simulation study and identify the better test statistic for detecting outliers inexponentiated Pareto distribution. At t...

متن کامل

A Tunned-parameter Hybrid Algorithm for Dynamic Facility Layout Problem with Budget Constraint using GA and SAA

A facility layout problem is concerned with determining the best position of departments, cells, or machines on the plant. An efficient layout contributes to the overall efficiency of operations. It’s been proved that, when system characteristics change, it can cause a significant increase in material handling cost. Consequently, the efficiency of the current layout decreases or is lost and it ...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016